The Combinatorial BLAS: design, implementation, and applications

نویسندگان

Aydin Buluç

John R. Gilbert

چکیده

This paper presents a scalable high-performance software library to be used for graph analysis and data mining. Large combinatorial graphs appear in many applications of high-performance computing, including computational biology, informatics, analytics, web search, dynamical systems, and sparse matrix methods. Graph computations are difficult to parallelize using traditional approaches due to their irregular nature and low operational intensity. Many graph computations, however, contain sufficient coarse grained parallelism for thousands of processors, which can be uncovered by using the right primitives. We describe the parallel Combinatorial BLAS, which consists of a small but powerful set of linear algebra primitives specifically targeting graph and data mining applications. We provide an extendible library interface and some guiding principles for future development. The library is evaluated using two important graph algorithms, in terms of both performance and ease-ofuse. The scalability and raw performance of the example applications, using the combinatorial BLAS, are unprecedented on distributed memory clusters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FlexiBLAS - A flexible BLAS library with runtime exchangeable backends

The BLAS library is one of the central libraries for the implementation of numerical algorithms. It serves as the basis for many other numerical libraries like LAPACK, PLASMA or MAGMA (to mention only the most obvious). Thus a fast BLAS implementation is the key ingredient for efficient applications in this area. However, for debugging or benchmarking purposes it is often necessary to replace t...

متن کامل

BLAS on the Trident Processor: Implementation and Performance Evaluation

This paper describes the implementation of the Basic Linear Algebra Subprograms (BLAS), which are widely used in many applications, on the Trident processor. We show how to use the Trident parallel execution units, ring, and communication registers to effectively perform vector-vector, matrix-vector, and matrix-matrix operations needed for implementing BLAS. The TFLOPS rate on infinite-size pro...

متن کامل

A High Performance FPGA-Based Accelerator for BLAS Library Implementation

This paper describes the implementation and the performance analysis of a hardware accelerator for the BLAS library matrix multiplication operation. This accelerator is based on a dual-FPGA board and on an implementation BLAS software library making use of the FPGA-based hardware. In order to evaluate the performance of such a system, we implemented the matrix multiplication operation (BLAS “dg...

متن کامل

Implementation of the Blas Level 3 and Linpack Benchmark on the Ap1000

This paper describes an implementation of Level 3 of the Basic Linear Algebra Subprogram (BLAS-3) library and the Linpack Benchmark on the Fujitsu AP1000. The performance of these applications is regarded as important for distributed memory architectures such as the AP1000. We discuss the techniques involved in optimizing these applications without significantly sacrificing numerical stability....

متن کامل

A Simulated Annealing Algorithm for Unsplittable Capacitated Network Design

The Network Design Problem (NDP) is one of the important problems in combinatorial optimization. Among the network design problems, the Multicommodity Capacitated Network Design (MCND) problem has numerous applications in transportation, logistics, telecommunication, and production systems. The MCND problems with splittable flow variables are NP-hard, which means they require exponential time t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IJHPCA

دوره 25 شماره

صفحات -

تاریخ انتشار 2011

The Combinatorial BLAS: design, implementation, and applications

نویسندگان

چکیده

منابع مشابه

FlexiBLAS - A flexible BLAS library with runtime exchangeable backends

BLAS on the Trident Processor: Implementation and Performance Evaluation

A High Performance FPGA-Based Accelerator for BLAS Library Implementation

Implementation of the Blas Level 3 and Linpack Benchmark on the Ap1000

A Simulated Annealing Algorithm for Unsplittable Capacitated Network Design

عنوان ژورنال:

اشتراک گذاری